Want to see how some of your variables relate to many others? Here’s an example of just this:

library(tidyr)
library(ggplot2)

mtcars %>%
  gather(-mpg, -hp, -cyl, key = "var", value = "value") %>% 
  ggplot(aes(x = value, y = mpg, color = hp, shape = factor(cyl))) + geom_point() +
  facet_wrap(~ var, scales = "free") + theme_bw()

This plot shows a separate scatter plot panel for each of many variables against mpg; all points are coloured by hp, and the shapes refer to cyl.

Let’s break it down.

Tidying our data

We’ll make use of the facet_wrap() function in the ggplot2 package, but doing so requires some careful data prep. Thus, assuming our data frame has all the variables we’re interested in, the first step is to get our data into a tidy form that is suitable for plotting.

We’ll do this using gather() from the tidyr package. Wwe gathered all of our variables as follows (using mtcars as our example data set):

library(tidyr)
mtcars %>% gather() %>% head()

This gives us a key column with the variable names and a value column with their corresponding values. This works well if we only want to plot each variable by itself (e.g., to get univariate information). However, here we’re interested in visualising multivariate information, with a particular focus on one or two variables. We’ll start with the bivariate case. Within gather(), we’ll first drop our variable of interest (say mpg) as follows:

mtcars %>% gather(-mpg, key = "var", value = "value") %>% head()

We now have an mpg column with the values of mpg repeated for each variable in the var column. The value column contains the values corresponding to the variable in the var column. This simple extension is how we can use gather() to get our data into shape.

Creating the plot

We want a scatter plot of mpg with each variable in the var column, whose values are in the value column. Creating a scatter plot is handled by ggplot() and geom_point(). Getting a separate panel for each variable is handled by facet_wrap(). We also want the scales for each panel to be “free”. Otherwise, ggplot will constrain them all the be equal, which doesn’t make sense for plotting different variables. For a clean look, let’s also add theme_bw().

mtcars %>%
  gather(-mpg, key = "var", value = "value") %>%
  ggplot(aes(x = value, y = mpg)) +  geom_point() + facet_wrap(~ var, scales = "free") + theme_bw()

We now have a scatter plot of every variable against mpg. Let’s see what else we can do.

Extracting more than one variable

We can layer other variables into these plots. For example, say we want to colour the points based on hp. To do this, we also drop hp within gather(), and then include it appropriately in the plotting stage:

mtcars %>%
  gather(-mpg, -hp, key = "var", value = "value") %>% head()

mtcars %>%
  gather(-mpg, -hp, key = "var", value = "value") %>%
  ggplot(aes(x = value, y = mpg, color = hp)) + geom_point() + facet_wrap(~ var, scales = "free") + theme_bw()

Let’s go crazy and change the point shape by cyl:

mtcars %>%
  gather(-mpg, -hp, -cyl, key = "var", value = "value") %>% head()

mtcars %>%
  gather(-mpg, -hp, -cyl, key = "var", value = "value") %>%
  ggplot(aes(x = value, y = mpg, color = hp, shape = factor(cyl))) +  geom_point() +
    facet_wrap(~ var, scales = "free") + theme_bw()

Perks of ggplot2

If you’re familiar with ggplot2, you can go to town. For example, let’s add loess lines with stat_smooth():

mtcars %>%
  gather(-mpg, key = "var", value = "value") %>%
  ggplot(aes(x = value, y = mpg)) + geom_point() + stat_smooth() + facet_wrap(~ var, scales = "free") + theme_bw()


ZellW/LearnPlotting documentation built on May 10, 2019, 1:57 a.m.